Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data
نویسندگان
چکیده
Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner's-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner's curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25-50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner's curse correction improved prediction R2 from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P = 2×10-5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.
منابع مشابه
Estimating the Total Number of Susceptibility Variants Underlying Complex Diseases from Genome-Wide Association Studies
Recently genome-wide association studies (GWAS) have identified numerous susceptibility variants for complex diseases. In this study we proposed several approaches to estimate the total number of variants underlying these diseases. We assume that the variance explained by genetic markers (Vg) follow an exponential distribution, which is justified by previous studies on theories of adaptation. O...
متن کاملAssociation between polygenic risk for schizophrenia, neurocognition and social cognition across development
Breakthroughs in genomics have begun to unravel the genetic architecture of schizophrenia risk, providing methods for quantifying schizophrenia polygenic risk based on common genetic variants. Our objective in the current study was to understand the relationship between schizophrenia genetic risk variants and neurocognitive development in healthy individuals. We first used combined genomic and ...
متن کاملIllustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication s...
متن کاملQuantifying and correcting for the winner's curse in quantitative-trait association studies.
Quantitative traits (QT) are an important focus of human genetic studies both because of interest in the traits themselves and because of their role as risk factors for many human diseases. For large-scale QT association studies including genome-wide association studies, investigators usually focus on genetic loci showing significant evidence for SNP-QT association, and genetic effect size tend...
متن کاملExplicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction.
Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could impro...
متن کامل